Goto

Collaborating Authors

 true state



Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

Sun, Xiaolin, Liu, Feidi, Ding, Zhengming, Zheng, ZiZhan

arXiv.org Artificial Intelligence

Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environments where minor manipulations of high-dimensional image inputs can easily mislead the agent's behavior. To this end, various defenses have been proposed recently, with state-of-the-art approaches achieving robust performance even under large state perturbations. However, after closer investigation, we found that the effectiveness of the current defenses is due to a fundamental weakness of the existing $l_p$ norm-constrained attacks, which can barely alter the semantics of image input even under a relatively large perturbation budget. In this work, we propose SHIFT, a novel policy-agnostic diffusion-based state perturbation attack to go beyond this limitation. Our attack is able to generate perturbed states that are semantically different from the true states while remaining realistic and history-aligned to avoid detection. Evaluations show that our attack effectively breaks existing defenses, including the most sophisticated ones, significantly outperforming existing attacks while being more perceptually stealthy. The results highlight the vulnerability of RL agents to semantics-aware adversarial perturbations, indicating the importance of developing more robust policies.



Online Learning of Dynamic Parameters in Social Networks

Neural Information Processing Systems

This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time.


Towards a Measure Theory of Semantic Information

Coghill, George M.

arXiv.org Artificial Intelligence

A classic account of the quantification of semantic information is that of Bar-Hiller and Carnap. Their account proposes an inverse relation between the informativeness of a statement and its probability. However, their approach assigns the maximum informativeness to a contradiction: which Floridi refers to as the Bar-Hillel-Carnap paradox. He developed a novel theory founded on a distance metric and parabolic relation, designed to remove this paradox. Unfortunately is approach does not succeed in that aim. In this paper I critique Floridi's theory of strongly semantic information on its own terms and show where it succeeds and fails. I then present a new approach based on the unit circle (a relation that has been the basis of theories from basic trigonometry to quantum theory). This is used, by analogy with von Neumann's quantum probability to construct a measure space for informativeness that meets all the requirements stipulated by Floridi and removes the paradox. In addition, while contradictions and tautologies have zero informativeness, it is found that messages which are contradictory to each other are equally informative. The utility of this is explained by means of an example.


Decentralized Hidden Markov Modeling with Equal Exit Probabilities

Sui, Dongyan, Zheng, Haitian, Leng, Siyang, Vlaski, Stefan

arXiv.org Artificial Intelligence

Social learning strategies enable agents to infer the underlying true state of nature in a distributed manner by receiving private environmental signals and exchanging beliefs with their neighbors. Previous studies have extensively focused on static environments, where the underlying true state remains unchanged over time. In this paper, we consider a dynamic setting where the true state evolves according to a Markov chain with equal exit probabilities. Based on this assumption, we present a social learning strategy for dynamic environments, termed Diffusion $\alpha$-HMM. By leveraging a simplified parameterization, we derive a nonlinear dynamical system that governs the evolution of the log-belief ratio over time. This formulation further reveals the relationship between the linearized form of Diffusion $\alpha$-HMM and Adaptive Social Learning, a well-established social learning strategy for dynamic environments. Furthermore, we analyze the convergence and fixed-point properties of a reference system, providing theoretical guarantees on the learning performance of the proposed algorithm in dynamic settings. Numerical experiments compare various distributed social learning strategies across different dynamic environments, demonstrating the impact of nonlinearity and parameterization on learning performance in a range of dynamic scenarios.


Robust Information Selection for Hypothesis Testing with Misclassification Penalties

Bhargav, Jayanth, Sundaram, Shreyas, Ghasemi, Mahsa

arXiv.org Machine Learning

We study the problem of robust information selection for a Bayesian hypothesis testing / classification task, where the goal is to identify the true state of the world from a finite set of hypotheses based on observations from the selected information sources. We introduce a novel misclassification penalty framework, which enables non-uniform treatment of different misclassification events. Extending the classical subset selection framework, we study the problem of selecting a subset of sources that minimize the maximum penalty of misclassification under a limited budget, despite deletions or failures of a subset of the selected sources. We characterize the curvature properties of the objective function and propose an efficient greedy algorithm with performance guarantees. Next, we highlight certain limitations of optimizing for the maximum penalty metric and propose a submodular surrogate metric to guide the selection of the information set. We propose a greedy algorithm with near-optimality guarantees for optimizing the surrogate metric. Finally, we empirically demonstrate the performance of our proposed algorithms in several instances of the information set selection problem.


On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow

Wang, Tonghan, Dong, Heng, Jiang, Yanchen, Parkes, David C., Tambe, Milind

arXiv.org Artificial Intelligence

Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to addressing PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors affect agents' handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using diffusion models. We first find that diffusion models conditioned on local history represent possible states as stable fixed points. In collectively observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents' local histories share a unique fixed point corresponding to the global state, while in non-CO settings, the shared fixed points yield a distribution of possible states given joint history. We further find that, with deep learning approximation errors, fixed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound the deviation by constructing a surrogate linear regression model that approximates the local behavior of diffusion models. With this bound, we propose a composite diffusion process iterating over agents with theoretical convergence guarantees to the true state.


InfraLib: Enabling Reinforcement Learning and Decision Making for Large Scale Infrastructure Management

Thangeda, Pranay, Betz, Trevor S., Grussing, Michael N., Ornik, Melkior

arXiv.org Artificial Intelligence

Efficient management of infrastructure systems is crucial for economic stability, sustainability, and public safety. However, infrastructure management is challenging due to the vast scale of systems, stochastic deterioration of components, partial observability, and resource constraints. While data-driven approaches like reinforcement learning (RL) offer a promising avenue for optimizing management policies, their application to infrastructure has been limited by the lack of suitable simulation environments. We introduce InfraLib, a comprehensive framework for modeling and analyzing infrastructure management problems. InfraLib employs a hierarchical, stochastic approach to realistically model infrastructure systems and their deterioration. It supports practical functionality such as modeling component unavailability, cyclical budgets, and catastrophic failures. To facilitate research, InfraLib provides tools for expert data collection, simulation-driven analysis, and visualization. We demonstrate InfraLib's capabilities through case studies on a real-world road network and a synthetic benchmark with 100,000 components.


Non-Bayesian Social Learning with Multiview Observations

Sui, Dongyan, Cao, Weichen, Vlaski, Stefan, Guan, Chun, Leng, Siyang

arXiv.org Artificial Intelligence

Non-Bayesian social learning enables multiple agents to conduct networked signal and information processing through observing environmental signals and information aggregating. Traditional non-Bayesian social learning models only consider single signals, limiting their applications in scenarios where multiple viewpoints of information are available. In this work, we exploit, in the information aggregation step, the independently learned results from observations taken from multiple viewpoints and propose a novel non-Bayesian social learning model for scenarios with multiview observations. We prove the convergence of the model under traditional assumptions and provide convergence conditions for the algorithm in the presence of misleading signals. Through theoretical analyses and numerical experiments, we validate the strong reliability and robustness of the proposed algorithm, showcasing its potential for real-world applications.